Goto

Collaborating Authors

 Dayton


MAGPI: Multifidelity-Augmented Gaussian Process Inputs for Surrogate Modeling from Scarce Data

Rex, Atticus, Qian, Elizabeth, Peterson, David

arXiv.org Machine Learning

Supervised machine learning describes the practice of fitting a parameterized model to labeled input-output data. Supervised machine learning methods have demonstrated promise in learning efficient surrogate models that can (partially) replace expensive high-fidelity models, making many-query analyses, such as optimization, uncertainty quantification, and inference, tractable. However, when training data must be obtained through the evaluation of an expensive model or experiment, the amount of training data that can be obtained is often limited, which can make learned surrogate models unreliable. However, in many engineering and scientific settings, cheaper \emph{low-fidelity} models may be available, for example arising from simplified physics modeling or coarse grids. These models may be used to generate additional low-fidelity training data. The goal of \emph{multifidelity} machine learning is to use both high- and low-fidelity training data to learn a surrogate model which is cheaper to evaluate than the high-fidelity model, but more accurate than any available low-fidelity model. This work proposes a new multifidelity training approach for Gaussian process regression which uses low-fidelity data to define additional features that augment the input space of the learned model. The approach unites desirable properties from two separate classes of existing multifidelity GPR approaches, cokriging and autoregressive estimators. Numerical experiments on several test problems demonstrate both increased predictive accuracy and reduced computational cost relative to the state of the art.




Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation

Ma, Zhipeng, Bahja, Ali Rida, Burgdorf, Andreas, Pomp, André, Meisen, Tobias, Jørgensen, Bo Nørregaard, Ma, Zheng Grace

arXiv.org Artificial Intelligence

Enhancing fuel efficiency in public transportation requires the integration of complex multimodal data into interpretable, decision-relevant insights. However, traditional analytics and visualization methods often yield fragmented outputs that demand extensive human interpretation, limiting scalability and consistency. This study presents a multi-agent framework that leverages multimodal large language models (LLMs) to automate data narration and energy insight generation. The framework coordinates three specialized agents, including a data narration agent, an LLM-as-a-judge agent, and an optional human-in-the-loop evaluator, to iteratively transform analytical artifacts into coherent, stakeholder-oriented reports. The system is validated through a real-world case study on public bus transportation in Northern Jutland, Denmark, where fuel efficiency data from 4006 trips are analyzed using Gaussian Mixture Model clustering. Comparative experiments across five state-of-the-art LLMs and three prompting paradigms identify GPT-4.1 mini with Chain-of-Thought prompting as the optimal configuration, achieving 97.3% narrative accuracy while balancing interpretability and computational cost. The findings demonstrate that multi-agent orchestration significantly enhances factual precision, coherence, and scalability in LLM-based reporting. The proposed framework establishes a replicable and domain-adaptive methodology for AI-driven narrative generation and decision support in energy informatics.


Binary perceptron computational gap -- a parametric fl RDT view

Stojnic, Mihailo

arXiv.org Machine Learning

Recent studies suggest that asymmetric binary perceptron (ABP) likely exhibits the so-called statistical-computational gap characterized with the appearance of two phase transitioning constraint density thresholds: \textbf{\emph{(i)}} the \emph{satisfiability threshold} $α_c$, below/above which ABP succeeds/fails to operate as a storage memory; and \textbf{\emph{(ii)}} \emph{algorithmic threshold} $α_a$, below/above which one can/cannot efficiently determine ABP's weight so that it operates as a storage memory. We consider a particular parametric utilization of \emph{fully lifted random duality theory} (fl RDT) [85] and study its potential ABP's algorithmic implications. A remarkable structural parametric change is uncovered as one progresses through fl RDT lifting levels. On the first two levels, the so-called $\c$ sequence -- a key parametric fl RDT component -- is of the (natural) decreasing type. A change of such phenomenology on higher levels is then connected to the $α_c$ -- $α_a$ threshold change. Namely, on the second level concrete numerical values give for the critical constraint density $α=α_c\approx 0.8331$. While progressing through higher levels decreases this estimate, already on the fifth level we observe a satisfactory level of convergence and obtain $α\approx 0.7764$. This allows to draw two striking parallels: \textbf{\emph{(i)}} the obtained constraint density estimate is in a remarkable agrement with range $α\in (0.77,0.78)$ of clustering defragmentation (believed to be responsible for failure of locally improving algorithms) [17,88]; and \textbf{\emph{(ii)}} the observed change of $\c$ sequence phenomenology closely matches the one of the negative Hopfield model for which the existence of efficient algorithms that closely approach similar type of threshold has been demonstrated recently [87].


Parameter Averaging in Link Prediction

Sapkota, Rupesh, Demir, Caglar, Sharma, Arnab, Ngomo, Axel-Cyrille Ngonga

arXiv.org Artificial Intelligence

Ensemble methods are widely employed to improve generalization in machine learning. This has also prompted the adoption of ensemble learning for the knowledge graph embedding (KGE) models in performing link prediction. Typical approaches to this end train multiple models as part of the ensemble, and the diverse predictions are then averaged. However, this approach has some significant drawbacks. For instance, the computational overhead of training multiple models increases latency and memory overhead. In contrast, model merging approaches offer a promising alternative that does not require training multiple models. In this work, we introduce model merging, specifically weighted averaging, in KGE models. Herein, a running average of model parameters from a training epoch onward is maintained and used for predictions. To address this, we additionally propose an approach that selectively updates the running average of the ensemble model parameters only when the generalization performance improves on a validation dataset. We evaluate these two different weighted averaging approaches on link prediction tasks, comparing the state-of-the-art benchmark ensemble approach. Additionally, we evaluate the weighted averaging approach considering literal-augmented KGE models and multi-hop query answering tasks as well. The results demonstrate that the proposed weighted averaging approach consistently improves performance across diverse evaluation settings.


Social Simulations with Large Language Model Risk Utopian Illusion

Bian, Ning, Han, Xianpei, Lin, Hongyu, Wu, Baolei, Wang, Jun

arXiv.org Artificial Intelligence

Reliable simulation of human behavior is essential for explaining, predicting, and intervening in our society. Recent advances in large language models (LLMs) have shown promise in emulating human behaviors, interactions, and decision-making, offering a powerful new lens for social science studies. However, the extent to which LLMs diverge from authentic human behavior in social contexts remains underexplored, posing risks of misinterpretation in scientific studies and unintended consequences in real-world applications. Here, we introduce a systematic framework for analyzing LLMs' behavior in social simulation. Our approach simulates multi-agent interactions through chatroom-style conversations and analyzes them across five linguistic dimensions, providing a simple yet effective method to examine emergent social cognitive biases. We conduct extensive experiments involving eight representative LLMs across three families. Our findings reveal that LLMs do not faithfully reproduce genuine human behavior but instead reflect overly idealized versions of it, shaped by the social desirability bias. In particular, LLMs show social role bias, primacy effect, and positivity bias, resulting in "Utopian" societies that lack the complexity and variability of real human interactions. These findings call for more socially grounded LLMs that capture the diversity of human social behavior.


Neural Reasoning for Robust Instance Retrieval in $\mathcal{SHOIQ}$

Teyou, Louis Mozart Kamdem, Friedrichs, Luke, Kouagou, N'Dah Jean, Demir, Caglar, Mahmood, Yasir, Heindorf, Stefan, Ngomo, Axel-Cyrille Ngonga

arXiv.org Artificial Intelligence

Concept learning exploits background knowledge in the form of description logic axioms to learn explainable classification models from knowledge bases. Despite recent breakthroughs in neuro-symbolic concept learning, most approaches still cannot be deployed on real-world knowledge bases. This is due to their use of description logic reasoners, which are not robust against inconsistencies nor erroneous data. We address this challenge by presenting a novel neural reasoner dubbed EBR. Our reasoner relies on embeddings to approximate the results of a symbolic reasoner. We show that EBR solely requires retrieving instances for atomic concepts and existential restrictions to retrieve or approximate the set of instances of any concept in the description logic $\mathcal{SHOIQ}$. In our experiments, we compare EBR with state-of-the-art reasoners. Our results suggest that EBR is robust against missing and erroneous data in contrast to existing reasoners.


Explainable Benchmarking through the Lense of Concept Learning

Zhang, Quannian, Röder, Michael, Srivastava, Nikit, Kouagou, N'Dah Jean, Ngomo, Axel-Cyrille Ngonga

arXiv.org Artificial Intelligence

Evaluating competing systems in a comparable way, i.e., benchmarking them, is an undeniable pillar of the scientific method. However, system performance is often summarized via a small number of metrics. The analysis of the evaluation details and the derivation of insights for further development or use remains a tedious manual task with often biased results. Thus, this paper argues for a new type of benchmarking, which is dubbed explainable benchmarking. The aim of explainable benchmarking approaches is to automatically generate explanations for the performance of systems in a benchmark. We provide a first instantiation of this paradigm for knowledge-graph-based question answering systems. We compute explanations by using a novel concept learning approach developed for large knowledge graphs called PruneCEL. Our evaluation shows that PruneCEL outperforms state-of-the-art concept learners on the task of explainable benchmarking by up to 0.55 points F1 measure. A task-driven user study with 41 participants shows that in 80\% of the cases, the majority of participants can accurately predict the behavior of a system based on our explanations. Our code and data are available at https://github.com/dice-group/PruneCEL/tree/K-cap2025


Codec Avatar Studio: Paired Human Captures for Complete, Driveable, and Generalizable Avatars

Neural Information Processing Systems

To create photorealistic avatars that users can embody, human modeling must be complete (encompass the full body), driveable (able to reproduce motion of the user from lightweight sensors), and generalizable ( i.e., easily adaptable to novel identities). Towards these goals, paired captures, that is, captures of the same subject obtained from systems of diverse quality and availability, are crucial. However, paired captures are rarely available to researchers outside of dedicated industrial labs: Codec Avatar Studio is our proposal to close this gap. Towards generalization and driveability, we introduce a dataset of 256 subjects captured in two modalities: high resolution multi-view scans of their heads, and video from the internal cameras of a headset.